Information extraction from non-segmented text (on the material of weather forecast telegrams)
نویسندگان
چکیده
Both the domain and sublanguage specific approach to text analysis and information extraction is proposed. Texts under consideration are weather forecast telegrams written in Russian. Telegrams are an example of deviant text type, with lack of text segmentation means, a lot of abbreviations, syntactic and spelling mistakes. The presented work pursues the problem of text segmentation: a procedure for the recovery of text structure is proposed that results in a sequence of topically coherent text fragments suitable for semantic interpretation. Topical mechanisms combined with narrative structure analysis allow disambiguation of circumstantial (locative and temporal) modifiers attachment.
منابع مشابه
Learning to Extract Text-Based Information from the World Wide Web
There is a wealth of information to be mined from narrative text on the World Wide Web. Unfortunately, standard natural language processing (NLP) extraction techniques expect full, grammatical sentences, and perform poorly on the choppy sentence fragments that are often found on web pages. This paper introduces Webfoot, a preprocessor that parses web pages into logically coherent segments based...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA new approach to wind turbine power generation forecasting, using weather radar data based on Hidden Markov Model
The wind is one of the most important and affecting phenomena and is known as one of the significant clean resources of energy. Apart from other atmospheric parameters, the wind has complex behavior and intermittent characteristics. Local phenomena can be accompanied by the wind, which is strong, non-predicted, and damaging. Weather radars are capable of detecting and displaying storm-related ...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کامل